Overview

Dataset statistics

Number of variables16
Number of observations4240
Missing cells715
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory530.1 KiB
Average record size in memory128.0 B

Variable types

Categorical8
Numeric8

Warnings

currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with Systolic BP and 1 other fieldsHigh correlation
diabetes is highly correlated with glucoseHigh correlation
Systolic BP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
Diastolic BP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
glucose is highly correlated with diabetesHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with Systolic BP and 1 other fieldsHigh correlation
Systolic BP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
Diastolic BP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with Systolic BP and 1 other fieldsHigh correlation
Systolic BP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
Diastolic BP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
Diastolic BP is highly correlated with Systolic BP and 1 other fieldsHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
Systolic BP is highly correlated with Diastolic BP and 1 other fieldsHigh correlation
prevalentHyp is highly correlated with Diastolic BP and 1 other fieldsHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
diabetes is highly correlated with glucoseHigh correlation
glucose is highly correlated with diabetesHigh correlation
education has 110 (2.6%) missing values Missing
BP Meds has 60 (1.4%) missing values Missing
tot cholesterol has 60 (1.4%) missing values Missing
glucose has 391 (9.2%) missing values Missing
cigsPerDay has 2145 (50.6%) zeros Zeros

Reproduction

Analysis started2021-05-26 06:17:43.090727
Analysis finished2021-05-26 06:18:25.397713
Duration42.31 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing8
Missing (%)0.2%
Memory size33.2 KiB
Female
2414 
Male
1818 

Length

Max length6
Median length6
Mean length5.140831758
Min length4

Characters and Unicode

Total characters21756
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowFemale
3rd rowMale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female2414
56.9%
Male1818
42.9%
(Missing)8
 
0.2%

Length

2021-05-26T11:48:26.613601image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:26.782993image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
female2414
57.0%
male1818
43.0%

Most occurring characters

ValueCountFrequency (%)
e6646
30.5%
a4232
19.5%
l4232
19.5%
F2414
 
11.1%
m2414
 
11.1%
M1818
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter17524
80.5%
Uppercase Letter4232
 
19.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6646
37.9%
a4232
24.1%
l4232
24.1%
m2414
 
13.8%
Uppercase Letter
ValueCountFrequency (%)
F2414
57.0%
M1818
43.0%

Most occurring scripts

ValueCountFrequency (%)
Latin21756
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6646
30.5%
a4232
19.5%
l4232
19.5%
F2414
 
11.1%
m2414
 
11.1%
M1818
 
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII21756
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e6646
30.5%
a4232
19.5%
l4232
19.5%
F2414
 
11.1%
m2414
 
11.1%
M1818
 
8.4%

age
Real number (ℝ≥0)

Distinct39
Distinct (%)0.9%
Missing2
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean49.57928268
Minimum32
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:27.182956image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile37
Q142
median49
Q356
95-th percentile64
Maximum70
Range38
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.572874944
Coefficient of variation (CV)0.1729124441
Kurtosis-0.989382703
Mean49.57928268
Median Absolute Deviation (MAD)7
Skewness0.2289804971
Sum210117
Variance73.49418481
MonotonicityNot monotonic
2021-05-26T11:48:27.478930image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
40192
 
4.5%
46182
 
4.3%
42180
 
4.2%
41174
 
4.1%
48173
 
4.1%
39170
 
4.0%
44166
 
3.9%
45162
 
3.8%
43158
 
3.7%
52149
 
3.5%
Other values (29)2532
59.7%
ValueCountFrequency (%)
321
 
< 0.1%
335
 
0.1%
3418
 
0.4%
3542
 
1.0%
3684
2.0%
3792
2.2%
38144
3.4%
39170
4.0%
40192
4.5%
41174
4.1%
ValueCountFrequency (%)
702
 
< 0.1%
697
 
0.2%
6818
 
0.4%
6745
1.1%
6638
 
0.9%
6557
1.3%
6493
2.2%
63110
2.6%
6299
2.3%
61110
2.6%

education
Categorical

MISSING

Distinct4
Distinct (%)0.1%
Missing110
Missing (%)2.6%
Memory size33.2 KiB
1.0
1717 
2.0
1252 
3.0
688 
4.0
473 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12390
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row2.0
3rd row1.0
4th row3.0
5th row3.0

Common Values

ValueCountFrequency (%)
1.01717
40.5%
2.01252
29.5%
3.0688
16.2%
4.0473
 
11.2%
(Missing)110
 
2.6%

Length

2021-05-26T11:48:27.950911image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:28.118873image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1.01717
41.6%
2.01252
30.3%
3.0688
16.7%
4.0473
 
11.5%

Most occurring characters

ValueCountFrequency (%)
.4130
33.3%
04130
33.3%
11717
13.9%
21252
 
10.1%
3688
 
5.6%
4473
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8260
66.7%
Other Punctuation4130
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04130
50.0%
11717
20.8%
21252
 
15.2%
3688
 
8.3%
4473
 
5.7%
Other Punctuation
ValueCountFrequency (%)
.4130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12390
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.4130
33.3%
04130
33.3%
11717
13.9%
21252
 
10.1%
3688
 
5.6%
4473
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII12390
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.4130
33.3%
04130
33.3%
11717
13.9%
21252
 
10.1%
3688
 
5.6%
4473
 
3.8%

currentSmoker
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing3
Missing (%)0.1%
Memory size33.2 KiB
0.0
2143 
1.0
2094 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12711
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.02143
50.5%
1.02094
49.4%
(Missing)3
 
0.1%

Length

2021-05-26T11:48:28.534856image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:28.694820image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.02143
50.6%
1.02094
49.4%

Most occurring characters

ValueCountFrequency (%)
06380
50.2%
.4237
33.3%
12094
 
16.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8474
66.7%
Other Punctuation4237
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06380
75.3%
12094
 
24.7%
Other Punctuation
ValueCountFrequency (%)
.4237
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12711
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06380
50.2%
.4237
33.3%
12094
 
16.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII12711
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06380
50.2%
.4237
33.3%
12094
 
16.5%

cigsPerDay
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct33
Distinct (%)0.8%
Missing31
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean9.001900689
Minimum0
Maximum70
Zeros2145
Zeros (%)50.6%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:28.886825image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile30
Maximum70
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.92074175
Coefficient of variation (CV)1.324247196
Kurtosis1.023052615
Mean9.001900689
Median Absolute Deviation (MAD)0
Skewness1.247912038
Sum37889
Variance142.1040838
MonotonicityNot monotonic
2021-05-26T11:48:29.150782image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
02145
50.6%
20734
 
17.3%
30217
 
5.1%
15210
 
5.0%
10143
 
3.4%
9130
 
3.1%
5120
 
2.8%
3100
 
2.4%
4080
 
1.9%
167
 
1.6%
Other values (23)263
 
6.2%
ValueCountFrequency (%)
02145
50.6%
167
 
1.6%
218
 
0.4%
3100
 
2.4%
49
 
0.2%
5120
 
2.8%
618
 
0.4%
712
 
0.3%
811
 
0.3%
9130
 
3.1%
ValueCountFrequency (%)
701
 
< 0.1%
6011
 
0.3%
506
 
0.1%
453
 
0.1%
4356
 
1.3%
4080
 
1.9%
381
 
< 0.1%
3522
 
0.5%
30217
5.1%
291
 
< 0.1%

BP Meds
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing60
Missing (%)1.4%
Memory size33.2 KiB
0.0
4056 
1.0
 
124

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12540
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.04056
95.7%
1.0124
 
2.9%
(Missing)60
 
1.4%

Length

2021-05-26T11:48:29.598741image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:29.758726image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.04056
97.0%
1.0124
 
3.0%

Most occurring characters

ValueCountFrequency (%)
08236
65.7%
.4180
33.3%
1124
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8360
66.7%
Other Punctuation4180
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08236
98.5%
1124
 
1.5%
Other Punctuation
ValueCountFrequency (%)
.4180
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12540
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08236
65.7%
.4180
33.3%
1124
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12540
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08236
65.7%
.4180
33.3%
1124
 
1.0%

prevalentStroke
Categorical

Distinct2
Distinct (%)< 0.1%
Missing9
Missing (%)0.2%
Memory size33.2 KiB
0.0
4206 
1.0
 
25

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12693
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.04206
99.2%
1.025
 
0.6%
(Missing)9
 
0.2%

Length

2021-05-26T11:48:30.134693image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:30.310699image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.04206
99.4%
1.025
 
0.6%

Most occurring characters

ValueCountFrequency (%)
08437
66.5%
.4231
33.3%
125
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8462
66.7%
Other Punctuation4231
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08437
99.7%
125
 
0.3%
Other Punctuation
ValueCountFrequency (%)
.4231
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12693
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08437
66.5%
.4231
33.3%
125
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII12693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08437
66.5%
.4231
33.3%
125
 
0.2%

prevalentHyp
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing2
Missing (%)< 0.1%
Memory size33.2 KiB
0.0
2922 
1.0
1316 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12714
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.02922
68.9%
1.01316
31.0%
(Missing)2
 
< 0.1%

Length

2021-05-26T11:48:31.423736image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:31.609674image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.02922
68.9%
1.01316
31.1%

Most occurring characters

ValueCountFrequency (%)
07160
56.3%
.4238
33.3%
11316
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8476
66.7%
Other Punctuation4238
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
07160
84.5%
11316
 
15.5%
Other Punctuation
ValueCountFrequency (%)
.4238
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12714
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
07160
56.3%
.4238
33.3%
11316
 
10.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII12714
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
07160
56.3%
.4238
33.3%
11316
 
10.4%

diabetes
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing2
Missing (%)< 0.1%
Memory size33.2 KiB
0.0
4129 
1.0
 
109

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12714
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.04129
97.4%
1.0109
 
2.6%
(Missing)2
 
< 0.1%

Length

2021-05-26T11:48:32.017640image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:32.187277image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.04129
97.4%
1.0109
 
2.6%

Most occurring characters

ValueCountFrequency (%)
08367
65.8%
.4238
33.3%
1109
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8476
66.7%
Other Punctuation4238
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08367
98.7%
1109
 
1.3%
Other Punctuation
ValueCountFrequency (%)
.4238
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12714
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08367
65.8%
.4238
33.3%
1109
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII12714
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08367
65.8%
.4238
33.3%
1109
 
0.9%

tot cholesterol
Real number (ℝ≥0)

MISSING

Distinct248
Distinct (%)5.9%
Missing60
Missing (%)1.4%
Infinite0
Infinite (%)0.0%
Mean236.6772727
Minimum107
Maximum696
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:32.390943image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum107
5-th percentile170
Q1206
median234
Q3263
95-th percentile312
Maximum696
Range589
Interquartile range (IQR)57

Descriptive statistics

Standard deviation44.61609832
Coefficient of variation (CV)0.1885102773
Kurtosis4.131474784
Mean236.6772727
Median Absolute Deviation (MAD)29
Skewness0.8736339696
Sum989311
Variance1990.596229
MonotonicityNot monotonic
2021-05-26T11:48:32.654919image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24085
 
2.0%
22070
 
1.7%
26062
 
1.5%
21061
 
1.4%
23259
 
1.4%
25056
 
1.3%
20056
 
1.3%
23054
 
1.3%
22554
 
1.3%
20553
 
1.2%
Other values (238)3570
84.2%
(Missing)60
 
1.4%
ValueCountFrequency (%)
1071
< 0.1%
1131
< 0.1%
1191
< 0.1%
1241
< 0.1%
1261
< 0.1%
1291
< 0.1%
1331
< 0.1%
1352
< 0.1%
1371
< 0.1%
1402
< 0.1%
ValueCountFrequency (%)
6961
 
< 0.1%
6001
 
< 0.1%
4641
 
< 0.1%
4531
 
< 0.1%
4391
 
< 0.1%
4321
 
< 0.1%
4103
0.1%
4051
 
< 0.1%
3981
 
< 0.1%
3921
 
< 0.1%

Systolic BP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct234
Distinct (%)5.5%
Missing4
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean132.3623702
Minimum83.5
Maximum295
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:32.910915image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum83.5
5-th percentile104
Q1117
median128
Q3144
95-th percentile175
Maximum295
Range211.5
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.03924407
Coefficient of variation (CV)0.1665068708
Kurtosis2.15409061
Mean132.3623702
Median Absolute Deviation (MAD)13
Skewness1.144615741
Sum560687
Variance485.7282791
MonotonicityNot monotonic
2021-05-26T11:48:33.294864image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120107
 
2.5%
130102
 
2.4%
11096
 
2.3%
12588
 
2.1%
11588
 
2.1%
12483
 
2.0%
12280
 
1.9%
12673
 
1.7%
12873
 
1.7%
12372
 
1.7%
Other values (224)3374
79.6%
ValueCountFrequency (%)
83.52
 
< 0.1%
851
 
< 0.1%
85.51
 
< 0.1%
902
 
< 0.1%
921
 
< 0.1%
92.52
 
< 0.1%
932
 
< 0.1%
93.52
 
< 0.1%
943
0.1%
957
0.2%
ValueCountFrequency (%)
2951
 
< 0.1%
2481
 
< 0.1%
2441
 
< 0.1%
2431
 
< 0.1%
2351
 
< 0.1%
2321
 
< 0.1%
2301
 
< 0.1%
2202
< 0.1%
2171
 
< 0.1%
2153
0.1%

Diastolic BP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct146
Distinct (%)3.4%
Missing5
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean82.90188902
Minimum48
Maximum142.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:34.218086image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum48
5-th percentile66
Q175
median82
Q390
95-th percentile104.65
Maximum142.5
Range94.5
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.9144673
Coefficient of variation (CV)0.1437176818
Kurtosis1.273033246
Mean82.90188902
Median Absolute Deviation (MAD)7.5
Skewness0.7126932766
Sum351089.5
Variance141.9545311
MonotonicityNot monotonic
2021-05-26T11:48:34.706014image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80262
 
6.2%
82152
 
3.6%
85137
 
3.2%
70135
 
3.2%
81129
 
3.0%
84122
 
2.9%
90118
 
2.8%
78116
 
2.7%
87113
 
2.7%
86108
 
2.5%
Other values (136)2843
67.1%
ValueCountFrequency (%)
481
 
< 0.1%
501
 
< 0.1%
511
 
< 0.1%
522
 
< 0.1%
531
 
< 0.1%
541
 
< 0.1%
553
0.1%
562
 
< 0.1%
576
0.1%
57.53
0.1%
ValueCountFrequency (%)
142.51
 
< 0.1%
1401
 
< 0.1%
1362
 
< 0.1%
1352
 
< 0.1%
1332
 
< 0.1%
1321
 
< 0.1%
1305
0.1%
1291
 
< 0.1%
1281
 
< 0.1%
127.51
 
< 0.1%

BMI
Real number (ℝ≥0)

Distinct1363
Distinct (%)32.3%
Missing24
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean25.79891603
Minimum15.54
Maximum56.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:35.393955image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum15.54
5-th percentile20.06
Q123.07
median25.395
Q328.04
95-th percentile32.7725
Maximum56.8
Range41.26
Interquartile range (IQR)4.97

Descriptive statistics

Standard deviation4.075256108
Coefficient of variation (CV)0.1579622998
Kurtosis2.666443184
Mean25.79891603
Median Absolute Deviation (MAD)2.485
Skewness0.982521942
Sum108768.23
Variance16.60771235
MonotonicityNot monotonic
2021-05-26T11:48:35.665928image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
23.4818
 
0.4%
22.5418
 
0.4%
22.9118
 
0.4%
22.1918
 
0.4%
25.0916
 
0.4%
23.0916
 
0.4%
23.113
 
0.3%
22.7313
 
0.3%
25.2313
 
0.3%
21.5112
 
0.3%
Other values (1353)4061
95.8%
(Missing)24
 
0.6%
ValueCountFrequency (%)
15.541
< 0.1%
15.961
< 0.1%
16.481
< 0.1%
16.592
< 0.1%
16.611
< 0.1%
16.691
< 0.1%
16.711
< 0.1%
16.731
< 0.1%
16.751
< 0.1%
16.871
< 0.1%
ValueCountFrequency (%)
56.81
< 0.1%
51.281
< 0.1%
45.81
< 0.1%
45.791
< 0.1%
44.711
< 0.1%
44.551
< 0.1%
44.271
< 0.1%
44.091
< 0.1%
43.691
< 0.1%
43.671
< 0.1%

heartRate
Real number (ℝ≥0)

Distinct72
Distinct (%)1.7%
Missing4
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean75.86779981
Minimum44
Maximum143
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:35.913905image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile60
Q168
median75
Q383
95-th percentile98
Maximum143
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.99948806
Coefficient of variation (CV)0.1581631218
Kurtosis0.8483320778
Mean75.86779981
Median Absolute Deviation (MAD)7
Skewness0.630294538
Sum321376
Variance143.9877138
MonotonicityNot monotonic
2021-05-26T11:48:36.233877image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75563
 
13.3%
80384
 
9.1%
70305
 
7.2%
60231
 
5.4%
85228
 
5.4%
72222
 
5.2%
65196
 
4.6%
90172
 
4.1%
68151
 
3.6%
10098
 
2.3%
Other values (62)1686
39.8%
ValueCountFrequency (%)
441
 
< 0.1%
452
 
< 0.1%
461
 
< 0.1%
471
 
< 0.1%
485
 
0.1%
5022
0.5%
511
 
< 0.1%
5217
0.4%
5311
0.3%
5412
0.3%
ValueCountFrequency (%)
1431
 
< 0.1%
1401
 
< 0.1%
1253
 
0.1%
1222
 
< 0.1%
1207
 
0.2%
1155
 
0.1%
1123
 
0.1%
11036
0.8%
1088
 
0.2%
1074
 
0.1%

glucose
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct143
Distinct (%)3.7%
Missing391
Missing (%)9.2%
Infinite0
Infinite (%)0.0%
Mean81.95193557
Minimum40
Maximum394
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-05-26T11:48:36.513851image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile62
Q171
median78
Q387
95-th percentile108
Maximum394
Range354
Interquartile range (IQR)16

Descriptive statistics

Standard deviation23.95842785
Coefficient of variation (CV)0.2923473093
Kurtosis58.7214474
Mean81.95193557
Median Absolute Deviation (MAD)8
Skewness6.217638805
Sum315433
Variance574.0062651
MonotonicityNot monotonic
2021-05-26T11:48:36.801847image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75193
 
4.6%
77167
 
3.9%
73156
 
3.7%
80153
 
3.6%
70152
 
3.6%
83151
 
3.6%
78148
 
3.5%
74141
 
3.3%
76127
 
3.0%
85126
 
3.0%
Other values (133)2335
55.1%
(Missing)391
 
9.2%
ValueCountFrequency (%)
402
 
< 0.1%
431
 
< 0.1%
442
 
< 0.1%
454
0.1%
473
0.1%
481
 
< 0.1%
503
0.1%
522
 
< 0.1%
535
0.1%
545
0.1%
ValueCountFrequency (%)
3942
< 0.1%
3861
< 0.1%
3701
< 0.1%
3681
< 0.1%
3481
< 0.1%
3321
< 0.1%
3251
< 0.1%
3201
< 0.1%
2971
< 0.1%
2941
< 0.1%

Heart-Att
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
3596 
1
644 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4240
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Length

2021-05-26T11:48:37.273802image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-26T11:48:37.433774image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring characters

ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4240
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring scripts

ValueCountFrequency (%)
Common4240
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII4240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Interactions

2021-05-26T11:47:57.441432image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:47:57.937386image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:47:58.249359image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:47:58.545336image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:47:58.817308image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:47:59.273272image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:47:59.801243image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:00.097196image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:00.401169image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:00.729133image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:01.097101image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:01.481071image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:01.953024image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:02.624965image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:03.016928image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:03.672872image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:04.088832image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:04.392805image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:04.688777image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:05.072745image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:05.408717image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:05.680689image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:05.968666image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:06.736629image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:07.272543image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:07.480529image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:07.688510image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:07.928507image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:08.192485image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:08.408446image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:08.600425image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:08.808409image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:09.008410image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:09.225571image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:09.441530image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:09.649531image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:09.857490image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:10.219080image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:10.781253image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:10.997234image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:11.221190image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:11.789391image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:12.325316image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:12.525300image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:12.717303image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:13.379097image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:13.731065image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:14.003039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:14.347031image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:14.562989image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:14.786991image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:14.994949image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:15.202931image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:15.418934image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:16.260141image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:16.548113image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:16.813277image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:17.604831image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:18.141556image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:18.965479image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:19.213480image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:19.493454image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:20.336434image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-26T11:48:20.720424image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-05-26T11:48:37.641773image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-26T11:48:38.457699image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-26T11:48:38.945639image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-26T11:48:39.481588image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-26T11:48:39.889546image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-26T11:48:21.528328image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-26T11:48:22.360283image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-05-26T11:48:24.115616image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-05-26T11:48:25.149735image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

GenderageeducationcurrentSmokercigsPerDayBP MedsprevalentStrokeprevalentHypdiabetestot cholesterolSystolic BPDiastolic BPBMIheartRateglucoseHeart-Att
0Male39.04.00.00.00.00.00.00.0195.0106.070.026.9780.077.00
1Female46.02.00.00.00.00.00.00.0250.0121.081.028.7395.076.00
2Male48.01.01.020.00.00.00.00.0245.0127.580.025.3475.070.00
3Female61.03.01.030.00.00.01.00.0225.0150.095.028.5865.0103.01
4Female46.03.01.023.00.00.00.00.0285.0130.084.023.1085.085.00
5Female43.02.00.00.00.00.01.00.0228.0180.0110.030.3077.099.00
6Female63.01.00.00.00.00.00.00.0205.0138.071.033.1160.085.01
7Female45.02.01.020.00.00.00.00.0313.0100.071.021.6879.078.00
8Male52.01.00.00.00.00.01.00.0260.0141.589.026.3676.079.00
9Male43.01.01.030.00.00.01.00.0225.0162.0107.023.6193.088.00

Last rows

GenderageeducationcurrentSmokercigsPerDayBP MedsprevalentStrokeprevalentHypdiabetestot cholesterolSystolic BPDiastolic BPBMIheartRateglucoseHeart-Att
4230Female56.01.01.03.00.00.01.00.0268.0170.0102.022.8957.0NaN0
4231Male58.03.00.00.00.00.01.00.0187.0141.081.024.9680.081.00
4232Male68.01.00.00.00.00.01.00.0176.0168.097.023.1460.079.01
4233Male50.01.01.01.00.00.01.00.0313.0179.092.025.9766.086.01
4234Male51.03.01.043.00.00.00.00.0207.0126.580.019.7165.068.00
4235Female48.02.01.020.0NaN0.00.00.0248.0131.072.022.0084.086.00
4236Female44.01.01.015.00.00.00.00.0210.0126.587.019.1686.0NaN0
4237Female52.02.00.00.00.00.00.00.0269.0133.583.021.4780.0107.00
4238Male40.03.00.00.00.00.01.00.0185.0141.098.025.6067.072.00
4239Female39.03.01.030.00.00.00.00.0196.0133.086.020.9185.080.00